Improved Alignment-Based Algorithm for Multilingual Text Compression
نویسندگان
چکیده
منابع مشابه
Improved Alignment Based Algorithm for Multilingual Text Compression
Multilingual text compression exploits the existence of the same text in several languages to compress the second and subsequent copies by reference to the first. This is done based on bilingual text alignment, a mapping of words and phrases in one text to their semantic equivalents in the translation. A new multilingual text compression scheme is suggested, which improves over an immediate gen...
متن کاملUsing alignment for multilingual text compression
Multilingual text compression exploits the existence of the same text in several languages to compress the second and subsequent copies by reference to the first. We explore the details of this framework and present experimental results for parallel English and French texts.
متن کاملAn Improved Hierarchical Lossless Text Compression Algorithm
Several improvements to the Bugajski-Russo N-gram algorithm are proposed. When applied to English text these result in an algorithm with comparable complexity and approximately 10 to 30% less rate than the commonly used COMPRESS algorithm. I. The N-Gram Algorithm The N-gram algorithm of Bugajski and Russo [1] is a hierarchical dictionary-type universal lossless source coder for a finite source ...
متن کاملExtending Huffman Coding for Multilingual Text Compression
Traditional text compression algorithms such as Huffman and LZ variants are usually based on 8-bit characters sampling. However, under the unicode representation for multilingual information, the character set of each language such as Chinese and Japanese is consisted of a very number of distinct characters and thus 16-bit or 32-bit character sampling is needed. Consequently, when text compress...
متن کاملAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics in Computer Science
سال: 2012
ISSN: 1661-8270,1661-8289
DOI: 10.1007/s11786-012-0138-1